Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Efficient graph-based dictionary search and its application to text-image searching

Identifieur interne : 001A38 ( Main/Exploration ); précédent : 001A37; suivant : 001A39

Efficient graph-based dictionary search and its application to text-image searching

Auteurs : Simon Lucas [Royaume-Uni]

Source :

RBID : ISTEX:939DD747377DD9DF8FB2CB122C3C52C8F8D97BD5

Abstract

This paper describes a novel method for applying dictionary knowledge to optimally interpret the confidence-rated hypothesis sets produced by lower-level pattern classifiers. This problem arises whenever image or video databases need to be scanned for textual content, and where some of the text strings are expected to be strings from a dictionary. The method is especially appropriate for large dictionaries, as might occur in vehicle registration number recognition for example. The problem is cast as enumerating the paths in a graph in best-first order given the constraint that each complete path is a word in some specified dictionary. The solution described here is of particular interest due to its generality, flexibility and because the time to retrieve each path is independent of the size of the dictionary. Synthetic results are presented for searching dictionaries of up to 1 million UK postcodes given graphs that correspond to insertion, deletion and substitution errors. We also present the initial results from processing real noisy text images.

Url:
DOI: 10.1016/S0167-8655(00)00117-3


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Efficient graph-based dictionary search and its application to text-image searching</title>
<author>
<name sortKey="Lucas, Simon" sort="Lucas, Simon" uniqKey="Lucas S" first="Simon" last="Lucas">Simon Lucas</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:939DD747377DD9DF8FB2CB122C3C52C8F8D97BD5</idno>
<date when="2001" year="2001">2001</date>
<idno type="doi">10.1016/S0167-8655(00)00117-3</idno>
<idno type="url">https://api.istex.fr/document/939DD747377DD9DF8FB2CB122C3C52C8F8D97BD5/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000E43</idno>
<idno type="wicri:Area/Istex/Curation">000E08</idno>
<idno type="wicri:Area/Istex/Checkpoint">001092</idno>
<idno type="wicri:doubleKey">0167-8655:2001:Lucas S:efficient:graph:based</idno>
<idno type="wicri:Area/Main/Merge">001B31</idno>
<idno type="wicri:Area/Main/Curation">001A38</idno>
<idno type="wicri:Area/Main/Exploration">001A38</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a">Efficient graph-based dictionary search and its application to text-image searching</title>
<author>
<name sortKey="Lucas, Simon" sort="Lucas, Simon" uniqKey="Lucas S" first="Simon" last="Lucas">Simon Lucas</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Royaume-Uni</country>
<wicri:regionArea>Department of Computer Science, University of Essex, Colchester CO4 3SQ</wicri:regionArea>
<wicri:noRegion>Colchester CO4 3SQ</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Royaume-Uni</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Pattern Recognition Letters</title>
<title level="j" type="abbrev">PATREC</title>
<idno type="ISSN">0167-8655</idno>
<imprint>
<publisher>ELSEVIER</publisher>
<date type="published" when="2001">2001</date>
<biblScope unit="volume">22</biblScope>
<biblScope unit="issue">5</biblScope>
<biblScope unit="page" from="551">551</biblScope>
<biblScope unit="page" to="562">562</biblScope>
</imprint>
<idno type="ISSN">0167-8655</idno>
</series>
<idno type="istex">939DD747377DD9DF8FB2CB122C3C52C8F8D97BD5</idno>
<idno type="DOI">10.1016/S0167-8655(00)00117-3</idno>
<idno type="PII">S0167-8655(00)00117-3</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0167-8655</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This paper describes a novel method for applying dictionary knowledge to optimally interpret the confidence-rated hypothesis sets produced by lower-level pattern classifiers. This problem arises whenever image or video databases need to be scanned for textual content, and where some of the text strings are expected to be strings from a dictionary. The method is especially appropriate for large dictionaries, as might occur in vehicle registration number recognition for example. The problem is cast as enumerating the paths in a graph in best-first order given the constraint that each complete path is a word in some specified dictionary. The solution described here is of particular interest due to its generality, flexibility and because the time to retrieve each path is independent of the size of the dictionary. Synthetic results are presented for searching dictionaries of up to 1 million UK postcodes given graphs that correspond to insertion, deletion and substitution errors. We also present the initial results from processing real noisy text images.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Royaume-Uni</li>
</country>
</list>
<tree>
<country name="Royaume-Uni">
<noRegion>
<name sortKey="Lucas, Simon" sort="Lucas, Simon" uniqKey="Lucas S" first="Simon" last="Lucas">Simon Lucas</name>
</noRegion>
<name sortKey="Lucas, Simon" sort="Lucas, Simon" uniqKey="Lucas S" first="Simon" last="Lucas">Simon Lucas</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001A38 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001A38 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:939DD747377DD9DF8FB2CB122C3C52C8F8D97BD5
   |texte=   Efficient graph-based dictionary search and its application to text-image searching
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024